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Abstract 

We consider rule sets for internet packet routing and filter- 
ing, where each rule consists of a range of source addresses, 
a range of destination addresses, a priority, and an action. 
A given packet should be handled by the action from the 
maximum priority rule that matches its source and destina- 
tion. We describe new data structures for quickly finding the 
rule matching an incoming packet, in near-linear space, and 
a new algorithm for determining whether a rule set contains 
any conflicts, in time 0(n 3 / 2 ). 

1 Introduction 

The working of the current Internet and its posited evolution 
depend on efficient packet filtering mechanisms: databases 
of rules, maintained at various parts of the network, which 
use patterns to filter out sets of IP packets and specify ac- 
tions to be performed on those sets. Typical filter patterns 
are based on packet header information such as the source 
or destination IP addresses. The actions to be performed de- 
pend on where the packet filtering is performed in the net- 
work. For example, at backbone routers, packet filters spec- 
ify which interface or link to use when forwarding packets. 
In firewalls, packet filters specify whether to allow a con- 
nection. More generally, packet filters specify Quality-of- 
Service actions such as restricting certain classes of traffic 
to no more than a threshold bandwidth. This packet filtering 
mechanism — maintaining a database of filters with asso- 
ciated actions and applying them to IP packets as appropri- 
ate — underlies most crucial aspects of the Internet: correct 
routing, providing security, guaranteeing service level agree- 
ments between different subnets, billing based on traffic pat- 
terns, etc. 

Implementing the packet filtering mechanism in the In- 
ternet involves sophisticated packet filter management tasks. 
In particular, we need packet classification, that is, given an 
IP packet with a specific header values, we need to determine 
which filter applies to that packet. We also need filter con- 
flict detection, that is, we need to determine whether two or 



more filters that apply to a packet specify conflicting actions. 
Conflicts are resolved by adding additional filters, so the fil- 
ter database remains consistent. These are the fundamen- 
tal packet filter management tasks governing the IP network 
performance. 

In this paper, we present efficient algorithms for solving 
both of these packet filter management problems. Our ap- 
proach is to solve the underlying abstract problem which, in 
each case, is naturally formulated as a geometric data struc- 
tural problem. We focus on simple techniques suitable for 
highly efficient implementations, especially in our packet 
classification algorithms, because in the future we hope to 
explore implementations of them in practical applications. 
However our work provides theoretical asymptotic improve- 
ments as well. 

The same abstract geometric data structural problems 
derived from these packet filtering applications arise inde- 
pendently in other important applications areas as well, and 
our results improve the best known results for those applica- 
tions. In what follows, we describe the packet filter manage- 
ment problems (Section 1.1) and our results (Section L2|), 



TJept. Inf. & Comp. Sci., Univ. of California, Irvine, CA 92697- 
3425. Email: eppstein@ics.uci.edu. Work performed in part while 
visiting AT&T. 

tAT&T Labs, Shannon Laboratory, 180 Park Ave., Florham Park, NJ 
07932. Email: muthu@research . att . com. 



and provide an overview of our techniques (Section 1.3) be- 
fore providing all the details (Sections|2]to|3|). We will briefly 
describe the oth er ap plication areas where our results are rel- 
evant in Section |l.3[ 

1.1 Packet Filter Management Problems 

A packet filter i in IP networks is a collection of d- 
dimensional ranges [/■ , rj] x • ■ • x [if, rf], an action A,- and a 
priority . The precise nature of action is not relevant here 
except that we can determine if two actions A, and Ay are in 
conflict (for example, if A, is to allow the packet through the 
firewall and Aj is to disallow it, there is a conflict of action). 
Any IP packet P can be viewed as a li-dimensional vector of 
values [P\, . . . , Pd] summarizing the header information of 
the packet. A filter i applies to packet P if Pj E [A, A] for 
each/' € [l,d]. 

Packet Classification Problem. A database F of filters 
is available for preprocessing. Each online query is 
a packet P, and the goal is to classify it, that is, to 
determine the filter of highest priority that applies to 
P. A related problem is to list all filters that apply to P. 
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Filter Conflict Detection Problem. Given a database F 
of filters, determine if there exists any packet P such 
that of the filters of the highest priority that apply to P, 
any two of them specify actions that conflict. Reated 
problems are to list all regions wherein conflicting P's 
lie, or to list all conflicting pairs of filters. 

Some remarks follow. Existing IP routers use destina- 
tion based routing, that is, use filters with d = 1 specifying 
ranges of destination IP addresses. As the Internet evolves 
from being the best effort network as it is now to provide 
differentiated services, two or more IP header fields may be 
specified by a filter. Some proposals are underway to specify 
many fields such as source IP address, destination IP address, 
source port, destination port etc., while others are underway 
which seem to preclude using more than just the source and 
destination IP addresses, that is, d = 2 (in IPsec for exam- 
ple, the source or destination port numbers may not be re- 
vealed.) In the rest of this paper, we will assume d = 2 and 
the fields that are specified are source and destination IP ad- 
dresses since that seems likely to be most prevalent and of 
immediate interest. 

Filters typically specify IP address ranges as an IP 
address a\ ■ ■ ■ 032 and a mask of certain number / of bits, that 
is, the range is a\ ■ ■ ■ a/00 ■ • • to a\ ■ ■ ■ a/11 • • • 1. So these 
are not arbitrary ranges. Instead they are hierarchical, that 
is, if two ranges intersect, one is completely contained in the 
other. All our results will in fact work for arbitrary ranges 
in each dimension although some of our algorithms can be 
made simpler for implementation purposes if the ranges are 
hierarchical. 

In both problems we will let n denote the number of 
filters in F. The value of n varies depending on where 
filtering is done: backbone routers may have hundreds of 
thousands of filters, firewalls may only have a few hundreds, 
etc. All numbers are integers in the range [0, U — 1] — for 
IP addresses, this is currently [0,2 32 — 1], but may go up to 
2 64 or higher in IPv6. 

1.2 Our Results 

Our main results are as follows. 

• Packet Classification Problem. We present an algo- 
rithm for this problem with different tradeoffs for data 
structure space vs filtering time. In particular, we ob- 
tain very fast classification times with near-linear space: 
with 0(n i+ "^) space, classification takes C(log log n) 
time, or with 0(n l+e ) space, classification takes 0(1) 
time.|] 

'For clarity, we have stated the results for U = bounds for 

general U appear later in the paper. Throughout this section, we assume 
U = n°C) for making comparisons with existing results. 



• Filter Conflict Detection Problem. We present an 
0(n) space, 0(r?l 2 ) time algorithm for this problem. 
Straightforward C(« 2 logn) time algorithms were the 
only known previous result. 

The packet classification problem has been extensively 
studied with over a dozen papers in the premier networking 
conferences (INFOCOM and SIGCOMM) in the past few 
years (e.g., see references in [Q]). Classification time is of 
paramount importance (for example, for backbone routers, 
filtering IP packets has to be done at the speed at which it 
forwards the packets, a blistering speed!). However, at such 
high speeds, memory is very expensive and the consensus 
in the networking community is that classification must be 
very fast, but that data structural space must be limited to 
the extent possible. The applied works in INFOCOM and 
SIGCOMM use near-linear space, but take time i7(log«) to 
classify each packet which they attempt to further speed up 
using large memory cache line etc. However, the golden 
standard has been the bound of 9 (log log n) that can be 
achieved for the d = 1. With the exception of [0], known 
algorithms for d = 2 fail to meet this bound. Our algorithmic 
result here meets this bound, but uses only 0(n 1+0 ( >) space 
improving upon the 0(n 1+e ) space needed by which 
is the previously best known result. Furthermore, our result 
is easily implementable; hence, it additionally holds promise 
as a practical packet classification solution. 

The filter conflict detection problem has received atten- 
tion only recently That work was primarily motivated 
by detecting security holes in firewalls. Filter databases in 
firewalls get modified by systems administrators manually 
or automatically (for example, when a host from inside a 
firewall requests a TCP connection with a host outside, a 
filter may be added to the firewall to enable the target host 
to open a TCP connection through the firewall). Conflicts 
arise quite naturally, and the task of the administrator is to 
resolve them appropriately. The work in [|J] was motivated 
by this scenario. However, conflict detection helps in audit- 
ing filter databases ^ in general for ambiguities in routing, 
unfulfilled service guarantees etc., that is, in general where 
packet filter mechanism is employed. It is straightforward to 
solve this problem in 0(n 2 log n) time. Our main contribu- 
tion here is in breaking the quadratic barrier and designing 
an Oir?! 2 ) time algorithm. 

1.3 Our Techniques and Other Applications 
of Our Results 

Both the packet classification and the filter conflict detection 
problem can be thought of as geometric problems in which 
each rule is a 2-dimensional axis-parallel rectangle^ The 
packet classification problem can be viewed as locating a 



In d dimensions, they will be (^-dimensional hyperrectangles. 
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point (the query IP packet header values) in a partition of 
space formed by overlaying these rectangles. The filter con- 
flict detection problem is that of detecting certain overlap- 
ping regions among rectangles of highest priority that over- 
lap a region. Our approach is to solve the underlying geo- 
metric data structural problems in the bounds quoted above. 
This has other immediate applications, for example to the 
problems in [g], giving the following new results: (1) faster 
multi-method lookup in object oriented languages and the 
first known efficient algorithm for auditing multi-method li- 
braries, (2) improved matching algorithms for rectangular 
matching, wherein, for the first time, matching time is in- 
dependent of the dictionary size while the space used is sub- 
quadratic in dictionary size, and (3) the first known optimal 
algorithm for approximately matching a pattern string with 
edit distance at most 1 in a text - the matching time is linear 
in the text size and preprocessed space is sublinear in the dic- 
tionary size. These three problems have extensive literature, 
and all these results are of independent interest. Readers are 
referred to for details. 

Our approach to solving the two packet classification re- 
lies on a standard plane-sweep approach to turn the static 
two-dimensional rectangle query problem into a dynamic 
one-dimensional problem, in which we maintain a dynamic 
set of intervals and must again query the maximum priority 
set element containing a query point. This one-dimensional 
problem must be solved persistently, so we can query previ- 
ous versions of the data structure after the plane sweep has 
occurred. We solve this persistent one-dimensional problem 
using a data structure combining ideas from B-trees and seg- 
ment trees. 

Our approach to the filter conflict detection problem 
uses a technique related to an algorithm by Overmars and 
Yap Jll| ] for Klee 's measure problem (determining the vol- 
ume of a union of rectangular blocks): we use a kD-tree |Q 
to divide the plane into rectangular cells, not containing any 
rectangle vertex, so that the rectangles intersecting any cell 
form stripes (i.e., rectangles that are unbounded in one di- 
mension). The conflict detection problem can thus be re- 
duced to determining a lower envelope of line segments, 
which can also be interpreted data structurally as an offline 
priority queue problem or graph theoretically as a minimum 
spanning tree verification problem. We solve this subprob- 
lem efficiently using a linear-time union-find data structure. 

2 Fast Packet Classification Queries 

As described above, packet classification can be viewed as an 
orthogonal range querying problem, in which we wish to find 
the maximum priority rectangle containing any query point. 
We now describe data structures for solving this problem 
efficiently. 



2.1 Persistent Interval Queries 

First, we consider a dynamic one-dimensional query prob- 
lem: what is the maximum priority interval containing a 
query point among a dynamically changing set of intervals, 
having integer endpoints in the range [0, U — 1]. We assume 
without loss of generality that U is a power of two. Our 
data structure will be partially persistent: an update must be 
performed on the most recent version of the structure, but a 
query can refer to any prior version. 

Our data structure will be parametrized by a value k, 
and will consist of blocks of (9(2*) memory words, each 
corresponding to information about an interval of values 
within the range [0, U— 1], An update may create new blocks 
but will not change existing blocks. If a block corresponds 
to query values in the interval [x, y], then by subinterval i we 
refer to the interval [x+i(y— x)2~ k ,x+ (/+ \){y — x)2~ k — 1]. 
A persistent version of the data structure will be represented 
by a pointer to a block forming the top level of the data 
structure. 

Each block contains the following information: 

• A table opt[/] of pointers to the maximum-priority inter- 
val in the dynamic set that contains subinterval i. 

• A table pq[/] of pointers to priority queue data structures 
for the intervals containing subinterval i. 

• A table subblock[/] of pointers to blocks representing 
the subset of dynamic intervals having endpoints in 
subinterval i. If a subinterval contains no endpoints, 
this pointer is null. 

The priority queues are not used for queries, and so do 
not need to be maintained persistently. We will later see 
how to eliminate them altogether for problems derived from 
hierarchical rectangle sets. 

LEMMA 2.1. The data structure described above can find 
the maximum priority interval containing a query point in 
timeO((log U)/k). 

Proof. We answer a query simply by repeatedly following 
the pointer subblock[/] for the subinterval i that contains the 
query point. For each block found via this chain of pointers, 
we look up the value opt[/] and compare the priorities of the 
intervals found in this way. 

Each successive block in the chain corresponds to an 
interval of size smaller by a 2~ k factor than the previous 
block, so the total number of blocks considered is log 2 ,t U = 
(log U)/k. For any interval X containing the query point 
there is a maximal block such that X contains the subinterval 
containing the query in that block; then by the assumption 
of maximality X must have an endpoint in the block and is 
a candidate for opt[/]. Therefore, the true maximum-priority 
interval containing the query is one of the ones found by the 
query, and the query algorithm is correct. □ 
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LEMMA 2.2. The data structure described above can 
be updated in time O ((2^ log n) (log U)/k) and space 
0(2* (log U)/k). 

Proof. To insert or delete an interval, we create a new copy 
of each block containing one of the endpoint intervals. By 
the same argument used to bound query time, there are at 
most 2 (log U)/k such blocks. For each copied block, we up- 
date the priority queues corresponding to subintervals con- 
taining the updated interval, copy pointers to these priority 
queues into the pq[z] pointers of the new block, and use these 
priority queues to set each value of opt[/]. We then copy 
each pointer subblock[z] from the previous version of the 
block, except for the one or two subintervals containing the 
updated interval's endpoints, which are changed to point to 
the new blocks for those subintervals. Each update causes 
the creation of at most 2 (log U)/k new blocks, using space 
0(2 k (log U) /k). Each update also changes 0(2* (log U) /k) 
priority queues, in time 0((2 k log n)(log U)/k). □ 

We summarize the results of this section: 

THEOREM 2. 1 . For any k there exists a data struc- 
ture for maintaining dynamic prioritized intervals in the 
range [0,(1/ — 1], and finding the maximum priority in- 
terval containing a query point in any persistent ver- 
sion of the data structure, in time C((log U)/k) per 
query, time ©((2* log n) (log U)/k) per update, and space 
0{2 k (log U) /k) per update. 

The log n factor in the update time can be reduced by 
building a segment tree of subintervals within each block, 
and maintaining a priority queue of the dynamic intervals 
corresponding to each canonical interval of the segment tree; 
we omit the details, since this factor does not form an im- 
portant part of our overall running time and can (as detailed 
below) be avoided entirely for hierarchical rectangles. 

2.2 Static to Dynamic Transformation 

We use the dynamic data structure of the previous subsection 
to solve our static rectangle querying problem, as follows. 

LEMMA 2.3. Suppose we are given a set S of n integers 
in the range [0, U — 1]. Then for any x we can build a 
data structure which finds the largest predecessor in S of a 
given query integer, in space 0{nx\og x U) and query time 
0(\og x U). 

Proof. Form a set of intervals [i, U — 1] with priority i for 
i G S. The maximum priority interval containing q has as its 
left endpoint the predecessor of q. Thus, we can u se a static 
version of the data structure described in Theorem 2.1 (with 
k = log x) to solve this problem. □ 



Beame and Fich [Qj provide matching 
(log log n/ log log log n) upper and lower bounds for 
integer predecessor queries in polynomial space, and survey 
several previous results on the problem. Because of the 
reduction above, their lower bounds apply as well the the 
maximum priority interval and rectangle problems. Our 
results escape this lower bound by having a space bound 
that depends on U and not just on n. 

THEOREM 2.2. Given a set of n axis-aligned prioritized 
rectangles with coordinates in the range [0, U — 1 ], and a pa- 
rameter x, we can build a data structure of size 0(nx \og x U) 
which can find the maximum priority rectangle containing a 
query point in time 0{\og x U). 

Proof. We consider a left-right sweep of the rectangles by a 
vertical line; for each position of the sweep line we maintain 
a dynamic set of intervals formed by the intersections of the 
rectangles with the sweep line. This intersection changes 
only when the sweep line crosses the left or right boundary 
of a rectangle; at the left boundary we insert the y-projection 
of the rectangle and at the right boundary we delete it. With 
each rectangle boundary we store a pointer to the version of 
the data structure formed when crossing that boundary. 

A query can be handled by using the integer predecessor 



data structure of Lemma 2.3 to find the x-coordinate of the 



nearest rectangle boundary to the right of the query point, 
and then performing a query in the corresponding version of 
the interval data structure. □ 

In particular when U = n°W we achieve query time 
O(loglogn) in space 0{n l+ "^), or query time 0(1) in 
space 0(n 1+e ), while previous solutions used space 9(n 1+e ) 
to achieve query time O (log log n) [^J. 

It is not difficult to modify our data structure to handle 
other decomposable queries, such as listing all rectangles 
containing the given query point, in similar time and space 
bounds. 

For hierarchical rectangles, we can simplify the dynamic 
interval data structure by using insertion and undo operations 
instead of more general insertions and deletions, and by 
omitting the pq[/] pointers and the priority queues they point 
to. An insertion can be handled by comparing the priority of 
the newly inserted interval to the values opt[z] for the blocks 
containing the interval's endpoints. An undo can be handled 
simply by restoring the pointer to the top-level block to its 
previous version. 

3 Conflict Detection 

We say that a set of rules, represented by a set of rectangles 
with priorities, has a conflict if there exists a query point q 
such that there is not a unique maximum-priority rectangle 
containing q. Note that this is a stronger condition than the 
existence of an intersecting pair of equal-priority rectangles, 
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since a higher-priority rectangle could cover the intersection 
and avoid a conflict. As defined in the introduction, the filter 
conflict detection problem further restricts conflicts to rules 
with conflicting actions; the algorithms described here can 
be extended to cases where the actions can be partitioned 
into a small number of conflict types but we omit the details. 

We would like to know whether a given set of prioritized 
rectangles has a conflict. A naive method would test each 
pair of equal-priority rectangles to determine whether they 
conflict, but this would not be efficient due to the difficulty 
of testing whether their intersection is covered by the union 
of higher priority rectangles. Less naively, the problem can 
be solved in near-quadratic time by querying each point 
determined by the horizontal boundary of one rectangle 
and the vertical boundary of another, or by constructing 
the arrangement of all the rectangles and using a priority 
queue to find the maximum priority rectangle(s) within 
each arrangement cell. We seek an even more efficient 
(subquadratic) solution. 

3.1 Priority Queues, Lower Envelopes, and 
MST Verification 

Consider the following three problems: 

• Given is an offline sequence of 0(n) integer priority 
queue operations: insert or delete a value in the set 
{0, 1, . . .n — 1} and query the minimum value. How 
quickly can one answer all the queries? 

• Given is a set of horizontal line segments (Figure |], 
left), each endpoint of which has coordinates in the set 
{0, l,...n — 1}. How quickly can one construct the 
lower envelope of the line segments? That is, if we 
think of each line segment as representing the graph of 
a (constant) function defined over a portion of the x- 
axis, what is the (piecewise constant) minimum of these 
functions (Figure [l], right)? 

• Given is a graph, in which the minimum spanning tree 
is a given path, and in which all edges have weights in 
the set {0, 1, . . .n— 1}. How quickly can one determine, 
for each edge in the path, which edge would replace it 
in the MST if the path edge were deleted? 

It is not difficult to see that in fact these problems 
are equivalent to each other: the insertion times, deletion 
times, and priorities in the offline priority queue correspond 
respectively to the x-coordinates of the left endpoints, x- 
coordinate of the right endpoints, and y-coordinates of the 
horizontal line segments, which correspond respectively to 
the first vertex (according to the path order), second vertex, 
and weight of the non-MST edges in the graph. 

Aho, Hopcroft, and Ullman [§, pp. 139-141] describe 
an algorithm for a similar offline priority queue problem, 



however their problem involves delete-minimum operations 
rather than deletions of particular values. Although the best 
replacement edge for each non-MST edge can be found in 
linear time 1 10 1, the fastest known algorithm for finding the 
best replacement for each MST edge (without the integer 
restriction) remains Tarjan's slightly superlinear one Jl2[]. 

LEMMA 3.1. The three problems described above can be 
solved in linear time. 

Proof. We consider the minimum spanning tree verification 
formulation of the problem, and consider the non-tree edges 
in sorted order by weight. Our algorithm finds the replace- 
ments for each path edge in a certain order; when a path 
edge's replacement is found we reduce the size of the graph 
by contracting that edge. This contraction clearly does not 
change the replacement for the remaining edges. We use 
a union-find data structure to keep track of the relation be- 
tween the original graph vertices and the vertices of the con- 
tracted graph. Since the contractions will be performed along 
the edges of a fixed tree (namely, the given path), we can use 
the linear-time union-find data structure of Gabow and Tar- 
jan |J or its recent simplification by Alstrup et al. 

Our algorithm, then, simply performs the following 
steps for each edge (m, v), in sorted order by edge weight: for 
each uncontracted edge (x, y) remaining in the path between 
u and v, set that edge's replacement to (u,v), contract the 
edge, and unite x and y in the union-find data structure. 

The time per edge (u, v) is a constant, plus a term 
proportional to the number of path edges contracted as a 
result of processing edge (u, v). Since each edge can only 
be contracted once, the total time is linear. □ 

The technique readily extends to finding best replace- 
ment edges for graphs where the MST is not a path. 

For our application to conflict detection, we also need 
to know whether there were any ambiguities in the above 
process; that is, whether any of the offline min operations 
can return more than one equal minimum value. This 
is essentially the same as our original conflict detection 
problem in one dimension rather than two. One way to 
solve this is to apply the above algorithm twice, once with an 
arbitrary tie-breaking order imposed on equal weight edges, 
and once again with the reverse order imposed, and test 
whether the two applications of the algorithm produce the 
same assignment of replacement edges. 

3.2 Stripes 

We first describe an efficient algorithm for conflict detection 
in the special case that each rectangle is a stripe; that is, 
either its vertical extent or its horizontal extent is the entire 
space [0, U — 1]. We do not expect such a restricted case to 
occur in our application, but it forms an important subroutine 
for our more general algorithm. 



6 




Figure 1: A set of horizontal line segments (left) and its lower envelope (right). 



We classify stripes into three types: 

• A horizontal stripe has x-extent [0, U — 1] and y-extent 
a proper subset of [0, U — 1]. 

• A vertical stripe has x-extent a proper subset of [0, U — 
1] and y-extent [0, U — 1]. 

• A universal stripe has both x-extent and y-extent equal 
to the entire space [0, U — 1]. 

LEMMA 3.2. Let a collection of prioritized stripes be given, 
together with sorted orderings of all stripes according 
to their priorities, the horizontal boundaries of horizon- 
tal stripes according to their y-coordinates, and the ver- 
tical boundaries of vertical stripes according to their x- 
coordinates. Then we can detect a conflict in this set of 
stripes in linear time. 

Proof. We first partition the space [0, C/ — l] 2 into horizontal 
stripes, according to the maximum-priority horizontal input 
stripe covering each point in the space; essentially this is just 
the lower envelope computation of Lemma 3.1. Let m/, de- 
note the minimum priority occurring in this partition. Sim- 
ilarly, we partition the space into vertical stripes according 
to the maximum-priority vertical input stripe covering each 
point, and let m v denote the minimum priority occurring in 
this partition. Finally, we let m u denote the maximum prior- 
ity of any universal stripe. (We set m/ 7 , m v , or m u to — oo if 
the corresponding set of stripes is empty.) 

We then use this information to search for conflicts, 
as follows, depending on the types of the two conflicting 
stripes: 

• To find a conflict between two horizontal stripes, if 
one exists, test whether there exists an ambiguity in 
the construction of the horizontal partition, as discussed 
below Lemma 3.1. If there is such an ambiguity, let pf, 



denote the maximum priority of any ambiguity. Then 
a conflict exists if and only if pf, > max{;« v ,m„}. 
Similarly we can find a conflict between two vertical 
stripes by letting p v denote the maximum priority of an 
ambiguity in the vertical partition, and testing whether 
p v > max{m/,,m u }. 

• A conflict between two universal stripes exists if and 
only if some two or more universal stripes have priority 
m u , and if m u > max{m/,, m v }. 

• A conflict between a universal and a horizontal stripe 
exists if and only if m u is also the priority of one of 
the stripes in the horizontal partition, and m u > m v . 
Similarly a conflict between a universal and a vertical 
stripe exists if and only if m u is also the priority of one 
of the stripes in the vertical partition, and m„ > ntf,. 

• A conflict between a horizontal stripe and a vertical 
stripe exists if and only if there is a priority p > m u that 
appears both in the horizontal and the vertical partition. 

Thus, the problem has been reduced to a constant num- 
ber of comparisons, together with two more complex opera- 
tions: determining whether m u appears in either of two sets 
of priorities, and determining the intersection of those two 
sets. Since we know the sorted order of the priorities, we 
can represent them by values in the range [0, n — 1] and use a 
simple bitmap to perform these membership and intersection 
tests in linear total time. □ 

3.3 kD-tree 

A kD-tree [||] of a set of points is a hierarchical partition into 
rectangular cells, formed as follows: 

• The root of the hierarchy is a bounding box for the point 
set. 

• If a cell at an even level of the hierarchy contains one 
or more points in its interior, then it is split into two 
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Figure 2: kD-tree for a set of rectangles. Upper left: the rectangles. Upper right: their vertices. Lower left: fcD-tree for the 
vertices. Lower right: maximal fcD-tree cells partition an input rectangle. 
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smaller cells by a vertical line through the point with 
the median x-coordinate. 

• If a cell at an odd level of the hierarchy contains one 
or more points in its interior, then it is split into two 
smaller cells by a horizontal line through the point with 
the median y-coordinate. 

If a cell contains an even number of points, either of the 
two median points can be used to determin its split line. The 
leaf cells of the kD-tree form a partition of the bounding box 
into 0(n) empty rectangles. Since each split divides its point 
set in half, the number of levels of the hierarchy is at most 
log 2 n (it can be smaller if several points are contained in a 
single split line). 

LEMMA 3.3. Any vertical or horizontal line cuts 0(y/n) 
cells at all levels of the kD-tree for a set of 0{n) points. 

Proof. If the line is horizontal (vertical), the number of cells 
cut by the line at most doubles at every even (odd) level of the 
kD-tree construction, and remains unchanged at every odd 
(even) level. The result follows from the log 2 n bound on the 
number of levels in the tree. □ 

In our conflict detection algorithm, we will use kD- 
trees defined on the set of corners of the input rectangles 
(Figure |J). We say that an input rectangle covers a fcD-tree 
cell if the cell is completely contained in the rectangle. We 
define a maximal covered cell for a given rectangle to be a 
cell that is covered by the rectangle, but for which the cell's 
parent is not covered. We say that a rectangle crosses a cell 
if it has a nonempty intersection with the interior of the cell 
but does not cover it. 

LEMMA 3.4. Any rectangle has 0(s/n) crossed cells and 
0(y/n) maximal covered cells at all levels in the kD-tree. 

Proof. The bound on the number of crossed cells follows 
immediately from Lemma 3.3. The parent of a maximal 



covered cell must be crossed, and each crossed cell can 
have at most one maximal covered child, so the number of 
maximal covered cells is also 0(^/n). □ 

3.4 The Conflict Detection Algorithm 

Clearly, if a set of rectangles has a conflict, then this conflict 
must occur within at least one of the leaf cells of a kD-tree. 
Further, since the leaf cells contain no rectangle corners, 
each rectangle acts like a stripe within any such cell: it 
extends either the full width or the full height of the cell. 
Thus, we can perform conflict detection by building a kD- 
tree and applying our stripe conflict detection algorithm to 
each cell. 

THEOREM 3.1. Given a set of n prioritized rectangles, we 
can determine whether the set has a conflict in time 0(n^^ 2 ) 
and space 0(n). 



Proof. We build a fcD-tree of the rectangle vertices (this can 
be done in time 0(n log n) and perform a depth first traversal 
of the tree. As we traverse the tree, we maintain at each cell 
of the traversal the following information: 

• The maximum priority of a rectangle covering the cell, 
and one or (if they exist) two rectangles having that 
maximum priority. 

• A list of the rectangles crossing the cell, sorted by 
priority. 

• A sorted list of the horizontal boundaries of the rectan- 
gles that cross the cell. 

• A sorted list of the vertical boundaries of the rectangles 
that cross the cell. 

When the traversal reaches a cell C, we can determine 
which of rectangles cross or cover the children of C, and 
extract the sorted sublists for its two children, in time linear 
in the number of rectangles crossing C. We also find the 
set of rectangles that cross C but maximally cover one of 
its children, scan this set for the maximum priority, and use 
this information (together with the maximum priority of a 
rectangle covering C) to determine the maximum priority of 
a rectangle covering each child. 

When the traversal reaches a leaf cell, we apply the 
algorithm of Lemma 3.2 to test whether the cell contains a 
conflict. 

While one child of a cell C is being processed recur- 
sively, we store with C only the portions of the sorted lists 
that have not been passed to that child, so that each rectan- 
gle or rectangle edge is stored in one of the lists only at a 
single level of the tree, keeping the total space linear. All 
operations performed when traversing a cell take time linear 
in the number of rectangles crossing or maximally covering 
the cell, so by Lemma 3.4 the total time is 0(n^fn). □ 



4 Concluding Remarks 

We have considered the two fundamental packet filter man- 
agement problems in IP networks, namely, packet classifica- 
tion and filter conflict detection, for the two dimensional case 
of immediate interest. For the packet classification problem, 
we present a simple algorithm that takes (log log n) time 
to classify packets matching the best known bounds for the 
one dimensional case, and improving upon the space needed 
by currently known solutions. For the filter conflict detec- 
tion problem, our solution is the first sub-quadratic time al- 
gorithm. 

Our packet classification algorithm may well turn out 
to be better than existing ones in practice, too. We fully 
intend to test that possibility. However, the task is not one of 
merely implementing our algorithm and comparing against 
the known ones. Since the study of packet classification is 



quite mature in the networking communities, we need to do 
a careful job adapting our solution (where to make best use 
of large memory cache line, how to combine hardware and 
software solutions, how to exploit the properties of rule sets 
to isolate small, hard subproblems where our solution will be 
useful, etc). Engineering such tradeoffs is best explored in a 
separate paper. 

Dynamic versions of the packet filter management prob- 
lem are open, as are extensions of our results to higher di- 
mensional query problems. 
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